An Efficient Interpolation Technique for Jump Proposals in Reversible-Jump Markov 

Chain Monte Carlo Calculations 
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Selection among alternative theoretical models given an observed data set is an important chal- 
lenge in many areas of physics and astronomy. Reversible- jump Markov chain Monte Carlo (RJM- 
CMC) is an extremely powerful technique for performing Bayesian model selection, but it suffers 
from a fundamental difficulty: it requires jumps between model parameter spaces, but cannot retain 
a memory of the favored locations in more than one parameter space at a time. Thus, a naive 
jump between parameter spaces is unlikely to be accepted in the MCMC algorithm and convergence 
is correspondingly slow. Here we demonstrate an interpolation technique that uses samples from 
single-model MCMCs to propose inter-model jumps from an approximation to the single-model pos- 
terior of the target parameter space. The interpolation technique, based on a kD-tree data structure, 
is adaptive and efficient in arbitrary dimensions. We show that our technique leads to dramatically 
improved convergence over naive jumps in an RJMCMC, and compare it to other proposals in the 
literature to improve the convergence of RJMCMCs. We also discuss the use of the same inter- 
polation technique in two other contexts: as a convergence test for a single-model MCMC and as 
a way to construct efficient "global" proposal distributions for single-model MCMCs without prior 
knowledge of the structure of the posterior distribution. 

PACS numbers: 02.70.Tt, 95.75.Pq, 07.05.Kf 



I. INTRODUCTION 

Selection among alternative theoretical models given 
an observed data set is an important challenge in many 
areas of physics and astronomy. In a Bayesian context, 
model selection involves computing the evidence for each 
model given the data. The model evidence is an inte- 
gral of the unnormalized posterior probability distribu- 
tion over the model parameter space, representing the 
probability of obtaining the data set within that model. 
Models with larger evidence are preferred; the ratio of 
the evidences of two models is the Bayes factor between 
them. The product of the Bayes factor and the ratio 
of prior probabilities for the two models yields the odds 
ratio for the models. 

There are many ways to compute model evidences. 
In low-dimensional parameter spaces, the unnormalized 
posterior probability can be evaluated on a grid or lat- 
tice and the integral can be performed directly. For many 
problems or models of interest, however, the dimensional- 
ity of parameter space is too large to make this approach 
practical, and stochastic sampling must be used. 

Markov Chain Monte Carlo (MCMC) methods at- 
tempt to stochastically produce parameter samples with 
density proportional to the posterior probability distri- 
bution. In MCMC techniques, the primary target is an 
accurate estimate of the posterior distribution. (We note 
that an alternative stochastic method for exploring a 
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model parameter space, nested sampling (TMl], focuses 
on evidence computation rather than sampling the pos- 
terior probability density functions.) It is not straightfor- 
ward to compute the model evidence from MCMC sam- 
ples. The most direct way to estimate the evidence for a 
model from MCMC samples is to compute the harmonic- 
mean estimator, but this estimator of the evidence suf- 
fers from infinite varianceJ3-(y| . MCMC implementations 
with parallel tempering [7| allow for evidence computa- 
tion via thermodynamic integration, but these can be 
computationally costly. 

Reference Q gives a method for directly computing the 
evidence integral from existing MCMC samples by using 
a kD-tree data structure to decompose a parameter space 
into boxes containing the MCMC sample points. The in- 
tegral is approximated as a sum over box volumes. This 
method is promising, but it is not clear in general what 
statistical and systematic errors it introduces and how 
these are affected by the shape of the posterior distribu- 
tion from which the MCMC samples. 

When the goal is model selection between several 
known models, only the relative evidence of each model 
is needed. In this circumstance, the Reversible Jump 
MCMC technique first introduced in Reference @ is one 
of the most reliable and accurate ways to compare the 
models. Reversible Jump MCMC (RJMCMC), described 
more fully in Section [Til performs a standard MCMC in 
a superspace that is a direct sum of all the model pa- 
rameter spaces. Such an MCMC involves both intra- 
and inter-model jumps; the number of MCMC samples 
in each model's parameter space is proportional to that 
model's relative evidence in the suite of models being 
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compared. 

Implemented naively, RJMCMC has a significant 
drawback: because the chain of samples must be Marko- 
vian, only the current sample is available to the algo- 
rithm as it is choosing the next sample. Each time an 
RJMCMC transitions between models, the information 
about the choices of parameter values in the previous 
model is lost; subsequent jumps into that model must 
"start fresh," and are correspondingly unlikely to be ac- 
cepted, delaying convergence of the RJMCMC sample 
chain. Reference addressed this issue by proposing 
a new method for producing inter-model jumps in an 
RJMCMC that relies on interpolating single-model pos- 
terior distributions using a box decomposition of param- 
eter space. 

Here we introduce an alternative technique for im- 
proving the acceptance ratio of inter-model jumps in 
an RJMCMC, leading to dramatically improved conver- 
gence of RJMCMC sample chains. The technique uses a 
kD-tree data structure to construct an approximation to 
each model's posterior parameter distribution. We draw 
jump proposals into the model from this approximation 
to its posterior. Because jumps are proposed preferen- 
tially to locations favored by the single-model posterior, 
the RJMCMC compares "good" locations in parameter 
space across all the models, and convergence is gener- 
ally rapid. We have successfully applied this RJMCMC 
technique to a 10- way model selection among alternative 
mass distribution models for black-hole X-ray binaries 

The method of Reference [ic| for producing inter- 
model jumps in an RJMCMC relies on a box decompo- 
sition of parameter space, using fixed-sized boxes. The 
method cannot adapt to the local structure of the pos- 
terior, and becomes asymptotically inefficient for high- 
dimensional parameter spaces or highly peaked poste- 
riors. Meanwhile, the approximation to the posterior 
distribution produced by the kD-tree is a constant-in- 
box interpolation of the posterior, similar in spirit to 
the phase-space density interpolants produced from N- 
body positions and momenta in Reference [l2j. The kD- 
tree interpolation is effective in parameter spaces of arbi- 
trary dimensionality, and is quite space-efficient, requir- 
ing O (N) storage space and O (log N) time to produce 
each proposed jump, where N is the number of single- 
model MCMC samples used to construct the interpola- 
tion. 

The structure of this paper is as follows. In Section [TT1 
wc introduce in more detail the concept of a Reversible 
Jump MCMC, and describe the fundamental difficulty 
with a naive jump proposal in an RJMCMC. In Section 
IIIII we introduce the kD-tree data structure used to de- 
compose the parameter space into boxes for interpola- 
tion. In Section IIVI we demonstrate the efficiency gains 
that are achieved from use of the interpolated jump pro- 
posal. In Section [V] we give examples of some other uses 
of the interpolated jump proposal, and suggest its utility 
in the context of a single-model MCMC. Finally, in Sec- 



tion [VT]wc offer a summary and some concluding remarks 
on the method. 



II. REVERSIBLE JUMP MCMC 

Reversible jump Markov chain Monte Carlo (RJM- 
CMC) @ is a technique for Bayesian model comparison. 
Below, we give a very brief introduction to Bayesian anal- 
ysis, describe a standard MCMC, and introduce RJM- 
CMC. 



A. Bayesian analysis 

Consider an observed data set d and a set of com- 
peting models for the data, indexed by an integer i: 
{Mi\i = 1,2,.. .}. Each model has some continuous pa- 
rameters, 9i] given the model and its parameters, we can 
make a prediction about the likelihood of observing the 
experimental data: L(d\9i,Mi). Within the framework 
of each model, Bayes' rule gives us a way to compute the 
posterior probability distribution function (PDF) for the 
model parameters implied by the data: 



p(0i\d,Mi) 



LMfli.A/iMflilMi) 
p(d\M t ) 



(1) 



where p(9 i \d,M i ) is the posterior distribution for the 
model parameters 9i implied by the data in the con- 
text of model Mj, p(0j|Mj) is the prior probability of 
the model parameters that represents our beliefs before 
accumulating any of the data d, and p(d\Mi), called the 
evidence, is an overall normalizing constant that ensures 
that p(9i\d, Mi) is properly normalized as a probability 
distribution on the This implies that the evidence is 
equal to 



p(d\Mi)= / d9 l L{d\9 t ,M l )p(9 l \M l ) 1 (2) 

where Vi is the parameter space volume in model Mi. 
For model comparison, we are interested in the posterior 
probability of a particular model, Mj, given the data, 
p(Mi\d). Using Bayes' rule, we see that this involves the 
evidence (Eq. ©): 



p(M z \d) 



p(d) ' 



(3) 



where p{Mi) is our a priori belief in model M, and p(d) 
is a normalizing constant, 



p(d) = J2p(d\M i )p(M i ). 



(4) 



When selecting among alternative models, we are in- 
terested in finding the model with the highest poste- 
rior probability p(Mi\d). However, attempts to directly 
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compute the evidence by performing the integration in 
Eq. @ are generally very difficult in a multi-dimensional, 
multi-modal parameter space when the likelihood has to 
be evaluated numerically. In particular, a grid-based in- 
tegral quickly becomes computationally unfeasible as the 
dimensionality of 9 exceeds a few. The parameter space 
must typically be explored in a stochastic manner before 
the evidence integral can be computed. There are sev- 
eral stochastic parameter-exploration techniques focused 
directly on evidence computation (e.g., nested sampling 
[3, and its variant MultiNest 0). Although nested 
sampling can be used to compute the posterior PDFs 
within each model along with the evidences for the var- 
ious models, the most common technique for computing 
posterior PDFs in the context of a model is the Markov 
chain Monte Carlo, which we now describe. 



B. Markov chain Monte Carlo 

A Markov chain Monte Carlo |l3| produces a set of 
samples | £ = 1, . . .]■ from the model parameter space 
that are sampled according to the posterior, meaning 
that, in the limit that the chain length tends to infinity, 
the relative frequency with which a given set of parameter 
appears in the chain is proportional to the desired poste- 
rior, p{9\d, M). Therefore, the output of an MCMC can 
be directly interpreted as the posterior PDF over the full 
parameter space, while PDFs for individual parameters 
can be obtained by marginalizing over the uninteresting 
parameters. 

A Markov chain has the property that the probability 
distribution of the next state can depend only on the 
current state, not on the past history: 

P {9 {3+1) )= [ dfiripiftri -> S^ +l ^)p0^), (5) 



where the jump probability p(9^> — > 6^ +1 ') depends 
only on and 9^ +1 \ An additional requirement for 
an MCMC arises from the fact that the desired distri- 
bution is the equilibrium distribution. In other words, 
if we assume that state (j) of the chain is sampled from 
the desired PDF, p(9^) = p(9^\d,M) , then the next 
state (j + 1) must be sampled from the PDF as well, so 
that = p{6^ +r >\d,M); this condition is known 

as "detailed balance." 

One way to produce such a sequence of samples is via 
the Me trop olis-Hastings algorithm, first proposed in Ref- 
erence and later generalized by Hastings fl5j : 

1. Given a current state 9^\ propose the next state 
9 P by drawing from a jump proposal distribution 
with probability Q(6& -> 6 P ). 

2. Compute the probability of accepting the proposed 



jump as 

_ . / P (9 p \d) Q(gp->gb-)) ^ 

Paccopt = mm 1 , — =r- . (6) 

V p(60)\d) Q(90) ->0p)/ 

3. Pick a uniform random number a £ [0,1]. If a < 
Paccept, accept the proposed jump, setting 9^ +1 ^ = 
9 P . Otherwise, reject the jump, and remain at the 
same location in parameter space for the next step, 

This jump proposal distribution Q{9^ — > 9 P ) can de- 
pend on the parameters of the current state 9^\ but 
not on the past history. It must also allow any state 
within the prior volume to be reachable (eventually) by 
the MCMC. Any jump proposal that satisfies these prop- 
erties is suitable for an MCMC. 

The jump proposal is the most important choice in the 
MCMC, as it determines the sampling efficiency of the 
algorithm, i.e., the length of the chain before it converges 
to the posterior PDF. Creating an efficient jump proposal 
distribution requires an understanding of the structure of 
the parameter space which may not be available until the 
PDFs arc found, creating a Catch-22; one possibility for 
resolving this infinite loop is described in Section Ivl 

It should be noted that although an MCMC whose 
jump acceptance criterium obeys detailed balance (as 
the Metropolis- Hastings algorithm does) must eventually 
converge to the desired distribution, there is no way to 
guarantee convergence in a fixed number of steps or to 
test whether a chain has converged in a foolproof manner. 
For example, MCMC chains can get stuck on local max- 
ima, producing an apparently well-converged sampling 
of the PDF in the vicinity of the maximum; or, if the 
chain visits a sequence of local maxima, moving rarely 
between maxima, the autocorrelation length of the chain 
may represent a substantial fraction of the total num- 
ber of samples, resulting in an effective sample size that 
is too small to accurately represent the relative sizes of 
the modes in the PDF (however, see Section [V] for one 
intriguing suggestion for remedying this issue). 

Finally, we note that, in practice, the randomly chosen 
initial starting point of the MCMC may be in a partic- 
ularly unlikely location in the parameter space. Because 
jumps arc frequently local, we will generally want to ig- 
nore the early points in a finite-size chain to avoid biases 
in the recovered posterior PDF due to the choice of the 
initial location. The points thus discarded are referred 
to as "burn-in" points. 



C. RJMCMC 

The samples produced by an MCMC algorithm can be 
used to directly perform a Monte Carlo evidence inte- 
gral. This results in a harmonic mean estimator for the 
evidence, which suffers from a large variance and bias 
Additional techniques for the direct integration of 
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evidence, also based on a kD tree decomposition of the 
parameter space (see Sec. are described in Q. These 
techniques are promising, but in some cases suffer (Tl| 
from large variance and bias. An alternative approach to 
model selection among a set of models is based on per- 
forming an MCMC in a "super-model" that encompasses 
all of the models under consideration; this is known the 
the Reversible Jump Markov chain Monte Carlo (RJM- 
CMC). 

The parameter space of the super-model in an RJM- 
CMC consists of a discrete parameter that identifies the 
model, Mi, and a set of continuous parameters appro- 
priate for that model, ftj. Thus, each sample consists 
of a model identifier and a location within the param- 
eter space of that model, {Mi,6i}. We perform the 
MCMC in the "super-model" parameter space just like 
a regular MCMC; we propose jumps to different param- 
eters within a model (intramodel jumps) and jumps to 
a different model with different parameters (intermodel 
jumps). The resulting chain samples from the posterior 
p(M i: 9 l \d). As in a usual MCMC, the PDF on the model 
as a parameter, with other parameters ignored, is ob- 
tained by marginalizing over the remaining parameters. 
The posterior probability of a model is proportional to 
the number of counts 

MA- j (7) 

where Ni is the number of RJMCMC samples listing the 
i'th model and N is the total chain length. Thus, the 
probability of a particular model relative to other models 
under consideration is given by the fraction of RJMCMC 
samples lying in the parameter space of that model. 

The main difficulty of achieving an efficient RJMCMC 
is finding a good jump proposal distribution for inter- 
model jumps. In order to have relatively high accep- 
tance ratios for intermodel jumps, which is necessary for 
efficient mixing between models, jumps should be pref- 
erentially proposed into regions with a high posterior. 
However, because the algorithm is Markovian, it has no 
past memory, so a jump proposed into a model from out- 
side can not access information from earlier in the chain 
which may identify a posterior peak. 

The way to solve this problem is to identify a good 
jump proposal distribution in advance, by exploiting in- 
formation from single-model MCMCs to generate effi- 
cient jump proposal distributions for our reversible jump 
MCMC. (Single-model MCMCs can take small local 
jumps within their model, meaning that they are much 
less likely than an RJMCMC to lose a high-posterior 
mode once it has been located.) The ideal jump pro- 
posal distribution for the parameters within a model 
would consist of the posterior PDF for those parameters, 
p(9i\Mi,d), and single- model MCMCs already represent 
samples from these posterior PDFs. However, the sam- 
ples are discrete, and a jump proposal must be continu- 
ous. Therefore, the output of each single-model MCMC 



must be interpolated to construct the desired jump pro- 
posal. The novel strategy we propose for efficiently in- 
terpolating a discretely sampled PDF is described in the 
next section. 



III. KD TREES AND INTERPOLATION 

The problem of drawing a proposed jump from an in- 
terpolation of single- model MCMC points can be thought 
of as the problem of assigning a local "neighborhood" to 
each point in the chain of MCMC samples. We choose 
these neighborhoods to be non-overlapping and fill pa- 
rameter space. To draw a proposed jump, we select a 
point uniformly from the MCMC samples, find its asso- 
ciated neighborhood, and then draw the proposed jump 
uniformly from the neighborhood. Since the MCMC 
points are distributed according to the posterior PDF 
for the single model, this procedure produces proposed 
jumps that are approximately distributed according to 
the posterior PDF. The size of a neighborhood is in- 
versely proportional to the local point density. The pro- 
posed jumps are drawn from a piecewise-constant (con- 
stant on each neighborhood) interpolation of the PDF. 
There are various techniques that could be used to con- 
struct the set of neighborhoods associated with each 
point. 

Ref. [T(| decomposed the parameter space into 
constant-volume "bricks" whose size was set by the typi- 
cal size of the peaks of the PDF. Each sample was associ- 
ated with the brick that contained it, and the probability 
of proposing a jump into a particular brick is thus pro- 
portional to the number of samples within that brick. 
Additionally, an extra uniform jump proposal was added 
to allow for jumps into bricks that did not contain any 
points, so that the jump proposal covers the entire model 
parameter space. However, the bricks in this algorithm 
do not adapt to the local structure of the PDF. One must 
cither use small bricks to capture the local structure of 
the PDF, placing many bricks in regions without MCMC 
samples (which can increase memory management and 
access costs), or use large bricks, missing the local struc- 
ture of the PDF in exchange for fewer empty bricks. 

An alternate technique for producing adaptive neigh- 
borhoods would be to use the Voronoi regions [lU associ- 
ated with each MCMC sample. The Voronoi region asso- 
ciated with a sample contains all the parameter space 
points that are closer to that sample than any other 
sample. The Voronoi region decomposition into neigh- 
borhoods is, in a sense, maximally adaptive, in contrast 
to the approach of Ref. [1(3], which is minimally adap- 
tive. Unfortunately, defining the Voronoi regions requires 
a metric on parameter space, which may be difficult or 
impossible to define. Also, the computational cost for 
computing the Voronoi regions increases rapidly with di- 
mensionality. 

Here we propose to use a decomposition of the param- 
eter space into neighborhoods based on a data structure 
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called a kD-tree (see, e.g. [17| or [HI). The decomposi- 
tion is more adaptive than the cells of Ref. [To[ , and more 
efficient in high-dimensional spaces than the Voronoi de- 
composition. 

A kD-trce is a binary, space-partitioning tree. To par- 
tition a set of points into a kD-tree, begin by placing 
them in a rectangular box that contains all of parameter 
space. Then proceed recursively 

1. If the given box contains exactly one point, stop; 
this is a leaf of the tree. Otherwise: 

2. Choose a dimension along which to divide the 
points. Divide the points in half along this di- 
mension (or nearly in half, if the number of points 
is odd), forming two sub-boxes. The "left" sub- 
box contains the half (or nearly half) of the points 
that have small coordinates along the chosen di- 
mension; the "right" sub-box contains the half (or 
nearly half) of the points that have large coordi- 
nates along the chosen dimension. 

3. Return to Step 1 with each of the sub-boxes, storing 
the resulting trees as sub-trees of the current box. 

The key algorithmic step in the production of a kD-tree is 
finding the median point along a given dimension in order 
to divide the points in half in Step 2. For n points, this 
can be accomplished in O (n) time (see, e.g., Ref. |19|). 
If there are N points in total, there are O (log N) levels 
in the tree; at each level, O (N) points must be processed 
once in the median-finding algorithm. Tree construction 
thus costs O(iVlog-ZV) in time, and the tree consumes 
O (N) space. As an illustration, box boundaries for a kD- 
tree constructed around a point set that is normally dis- 
tributed around the origin in two dimensions are shown 
in Figure [TJ 

In order to use the kD-tree interpolation as a jump 
proposal in an MCMC, we must be able to quickly find 
the neighborhood associated with a given point to com- 
pute the jump probability (see Eq. [5]). This can be ac- 
complished in O(logiV) time and constant space with 
the following algorithm, which is a standard binary tree 
search. Given the point, Oi and the tree, T: 

1. If T contains exactly one sample point, then its box 
is the associated neighborhood. Otherwise: 

2. The tree T has two sub-trees. If the point Oi is con- 
tained in the "left" sub-tree, then return to Step 1, 
considering this sub-tree; otherwise return to Step 
1, considering the "right" sub-tree. 



1 The kD-trcc datastructure defined here places box boundaries 
between the points. An alternate definition common in the lit- 
erature places box boundaries on the median point, but such a 
definition is inconvenient for our purposes. 




FIG. 1. The neighborhoods from a kD-tree constructed 
around a set of points that are normally distributed about 
the origin in two dimensions. As the points become denser 
around the origin, the typical neighborhood gets smaller. The 
interpolated PDF within a box of volume Vi is l/(NVi), where 
N is the total number of points (which is also the number of 
boxes) . 



IV. RJMCMC EFFICIENCY 

In this section, we demonstrate the efficiency of the kD- 
interpolatcd jump proposal on a toy model-comparison 
problem. We draw N = 100 simulated data points from 
a N(0, 1) Gaussian distribution, and then ask whether 
these data are better described by a model where they are 
Gaussian distributed with unknown mean \i and standard 
deviation a 



p{x) 



1 



2tT(7 



exp 



2a 2 



(8) 



or by a model where they are Cauchy distributed with 
mode a and width ft 

1 



p(x) 



7T/3(l+(^ 



(9) 



We take priors on fi and a to be uniform in [—1, 1], and 
priors in a and /3 to be uniform in [0.5, 1.5]. With a data 
set of 100 points, the relative uncertainty in determining 
the parameters of the underlying distribution is approx- 
imately 10%, so we expect the posterior probabilities in 
the (/!, a) and {a, /3) spaces to occupy only a few percent 
of the prior volume. The Cauchy distribution is much 
broader than the Gaussian (it has no finite moments), so 
with uniform model priors, the posterior probability for 
the Gaussian model over the Cauchy model is extremely 
large: 



p(Gaussian|cf) g 
p( Cauchy | d) 



(10) 
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FIG. 2. The inter-model acceptance rate versus the number 
of points per box when the kD-tree neighborhood search is 
truncated. As the number of points per box increases, and 
the interpolation becomes less accurate, the acceptance rate 
falls, asymptoting to the rate for naive draws from the uniform 
prior (about 5% for this data set). 



In order to ensure that the RJMCMC produces samples 
in the Cauchy model at all, we impose a model prior 
that favors the Cauchy model by 5 x 10 8 relative to the 
Gaussian. The evidence ratio between the models for our 
chosen data set with these priors is 



p(Gaussian|o?) _ ^ r 
p(Cauchy|d) 



(11) 



yielding a theoretical maximum acceptance rate of inter- 
model jumps of (1 + l/r)/2 = 0.93. 

We obtain 10 4 single-model MCMC samples by inde- 
pendently running MCMC within each model, and use 
the kD-trcc interpolation method described above to pro- 
pose inter-model jumps in an RJMCMC. The acceptance 
rate of inter-model jumps is approximately 0.8. To ex- 
plore how the efficiency of the method degrades as the 
interpolation becomes less accurate, we artificially trun- 
cated the kD tree with higher and higher numbers of 
points in each box (this can be accomplished during the 
neighborhood search phase by stopping the search for 
a box when one is found containing the desired num- 
ber of points). For each truncation choice, we performed 
an RJMCMC with the resulting interpolated jump pro- 
posal. The acceptance rate is plotted against the number 
of single-model MCMC points per box (kD-tree leaf) in 
Figure [5] The more points in each leaf of the tree when 
the search is truncated, the lower the acceptance prob- 
ability; when points are drawn from the top level of the 
tree, the acceptance probability asymptotes to the naive 
draw from the prior (~ 5%). 

The relative error on the determination of the Bayes 
factor (evidence ratio) scales with \j \J ^transitions, where 
^transitions is the number of inter-model transitions in 
the RJMCMC. Thus, as the acceptance rate of inter- 
model jumps goes down, the RJMCMC must run longer 



to achieve a desired accuracy in the evidence ratio. By 
boosting the acceptance rate of inter-model jumps, the 
interpolation method described above can dramatically 
improve the runtime of an RJMCMC. 



V. EXAMPLES AND OTHER USES 

RJMCMC with efficient PDF interpolation via kD 
trees can be used in a large variety of problems in physics 
and astronomy, whenever Bayesian analysis is used to 
perform model selection. Below, we describe several sce- 
narios in which we have successfully applied this tech- 
nique. Moreover, PDF interpolation with a kD tree 
can be extremely useful in other contexts, beyond a 
reversible-jump MCMC. We suggest two examples be- 
low: generating efficient jump proposal distributions in a 
single-model MCMC and convergence tests. 

We have successfully employed the RJMCMC tech- 
nique described above when evaluating several alterna- 
tive models for the distribution of the masses of black 
holes in X-ray binaries We performed a Bayesian 

analysis of the mass distribution of stellar-mass black 
holes using the observed masses of 15 low-mass X-ray 
binary systems undergoing Roche lobe overflow and five 
high-mass, wind-fed X-ray binary systems. We consid- 
ered ten different mass distribution models: Gaussian, 
double Gaussian, power law, exponential decay, log nor- 
mal, and 1-, 2-, 3-, 4-, and 5-bin histograms. Each model 
was described by between two and six parameters. Model 
selection using RJMCMC with kD trees allowed us to de- 
termine that the mass distribution of the low-mass sys- 
tems is best fit by a power-law, while the distribution 
of the combined sample is best fit by the exponential 
model. Based on the model selection, we were able to 
determine which models would provide the best infor- 
mation about astrophysically relevant parameters, such 
as the minimum black hole mass. We were also able to 
conclude that the low-mass subsample is not consistent 
with being drawn from the distribution of the combined 
population 

Gravitational- wave astronomy provides the setting for 
another ongoing study. Interesting triggers from the 
LIGO [2{| and Virgo [2l[ gravitational-wave detector 
pipeline that searches for compact binary coalescences 
are followe d up with Bayesian parameter estimation tools 
(see, e.g., j22H24| ). In general, several possible models 
for the data are considered, including a pure noise model, 
noise superimposed on a non-spinning gravitational- wave 
signal, or noise superimposed on a gravitational wave sig- 
nal that includes significant spin or spins in the binary 
components. The models can include a large number of 
parameters (up to 15 for the model with two spinning 
components [25]). and RJMCMC with interpolation via 
kD trees is again successful at computing the model evi- 
dences. 

As discussed in Section III Bl finding an efficient jump 
proposal distribution can be a challenge even for a single- 
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model MCMC. This is particularly difficult when the pa- 
rameter space is multi-modal, in which case the Markov 
chain can get stuck on individual high-posterior islands in 
the parameter space. Traversals of low-posterior oceans 
by a series of short jumps are extremely unlikely, and ran- 
dom long jumps arc almost always rejected, making sam- 
pling very inefficient. Essentially, such isolated islands in 
parameter space behave as separate models, and weight- 
ing them by the number of samples in each peak then 
corresponds to model selection. Ideally, an analytical 
understanding of the model would allow for the creation 
of jump proposal distributions that combine short jumps 
designed to explore individual islands with directed long 
jumps that are likely to end up on neighboring islands. 
However, it is not always possible to gain such under- 
standing short of performing the MCMC itself. One so- 
lution lies in running multiple chains that explore the 
parameter space, perhaps getting stuck on local maxima, 
and then combining the points and interpolating with a 
kD tree to create a jump proposal distribution. Although 
the chains in the first iteration will not have transitioned 
between the peaks, and the samples will therefore not 
weight the separate peaks appropriately, this will ensure 
that all of the modes (islands) are included in the jump 
proposal distribution for a second stage. This second 
stage can employ an interpolated jump proposal exactly 
as in the model selection discussed above. Other mod- 
ifications to this technique, which we are currently in- 
vestigating, include multiple iterative stages, with the 
interpolated samples from each previous stage used as a 
jump proposal distribution for the next stage, until the 
posterior PDF is read off from the final stage; and the 
use of higher temperatures in the early-stage chains to 
improve sampling. 

An example of a multi-modal parameter space where 
chains can get stuck on high-posterior islands comes from 
antipodal sky location degeneracies for networks of three 
ground-based gravitational-wave interferometers or for 
low-frequency gravitational wave signals in the proposed 
space-borne instrument LISA [26[. Although PDFs pro- 
duced by MCMC chains will generally show both lo- 
cations on the sky, the chain length may not be suffi- 
ciently long in practice to include enough jumps between 
the two high-likelihood antipodal locations. In general, 
the relative weight of two separated peaks is determined 
with a fractional error that scales as 1/ -^-^transitions, 
where ^transitions is the number of transitions between 
the peaks. By increasing the number of proposed tran- 



sitions, the convergence of such PDFs can be enhanced 
using the kD-tree interpolated proposal. As described 
above, one could test the convergence of the PDF by 
running a followup MCMC with a jump proposal distri- 
bution from an interpolation of the first original PDF. 
This follow-up MCMC would attempt large global jump 
proposals rather than small local ones, meaning that 
it could rapidly accumulate sufficient points to deter- 
mine whether the original run spent the correct fractional 
amounts of time in the various modes. 

VI. CONCLUSION 

The need to compare evidences for multiple models 
arises in a large variety of physical and astronomical 
contexts. In this paper, we described a new technique 
that allows for efficient evidence computations via a 
Reversible- Jump Markov chain Monte Carlo. This tech- 
nique solves the usual problem of finding good inter- 
model jump proposals in an RJMCMC by using a kD- 
tree to quickly and accurately interpolate an approximate 
posterior PDF from a single-model MCMC run, and then 
using this interpolated PDF to propose efficient inter- 
model jumps. 

We demonstrated the efficiency of this technique on a 
toy model-comparison problem described in Section IIVI 
Wc also successfully applied this technique to the prob- 
lem of selecting the best model for the observed distri- 
bution of black-hole X-ray binaries, as described in Sec- 
tion [V] and Ref. . In addition to model comparison, 
the PDF interpolation described here can be useful in 
multi-step MCMCs to improve the sampling efficiency 
by selecting better jump proposal distributions or to test 
MCMC convergence. 

We have made our implementation of the tech- 
nique described in this paper publicly available online 
at http : //github . com/f arr/mcmc-ocaml, and welcome 
readers to take advantage of this toolkit. 



ACKNOWLEDGMENTS 

We are grateful to Neil Cornish for interesting discus- 
sions. IM acknowledges support from the NSF Astron- 
omy and Astrophysics Postdoctoral Fellowship, award 
AST-0901985. WF acknowledges support from NSF 
grant AST0908930 and NASA grant NNX09AJ56G. 



[1] J. Skilling, m \AIP Conf. Ser.\ Vol. 735, edited by R. Fis- 
cher, R. Preuss, & U. V. Toussaint (2004) pp. 395-405. 
[2] J. Skilling, Bayesian Analysis 1, 833 (2006). 
[3] F. Feroz, M. P. Hobson, and M . Bridges, 



Mon. Not. R. Astron. Soc. 398, 1601 (2009) 
arXiv:0809.3437 



[4] M. A. Newton and A. E. Raftery, J. R. Stat. Soc. B 56, 
3 (1994). 

[5] S. Chib, J. Am. Stat. Assoc. 90, 1313 (1995). 
[6] R. van Haasteren, ArXiv e-prints 



arXiv:091 1.2150 [astro-ph.IM; 



[7] D. 



Earl 



and 



M. 



W. 



|Phys. Chem. Chem. Phys. 7, 3910 (2005)] 



(2009), 



Deem, 



8 



|arXiv:physics/0508111| 
[8] M. D. Weinberg, Submitted to Bayesian Analysis (2009), 

I arXiv:091 1.17771 [20] 
[9] P. J. Green, Biometrika 82, 711 (1995). 
[10] T. B. Littenbe rg and N. J. Cor nish, Phys. Rev. D 80, 

063007 f2009), larXiv:0902.0368l [21] 
[11] W. M. Farr, N. Sravan, A. Cantrell, L. Kreidberg, C. D. [22] 

Bailyn, I. Mandel, and V. Kalogera, Submitted to the 

Astrophys. J. (20101. larXiv:1011. 14591 
[12] Y. Ascasibar an d J. Binney, Mon. Not. R . Astron. Soc. [23] 

356, 872 (2005), |arXiv:astro-ph/0409233| 
[13] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, 

Markov chain Monte Carlo in practice (Chapman & 

Hall/CRC, 1996). [24] 
[14] N. Metropolis, A. W. Rosenbluth, M. N. 

Rosenbluth, A. H. Teller and E. Teller, 

|J. Chem. Phys. 21, 1087 (1953)| [25] 
[15] W. K. Hastings, Biometrika 57, 97 (1970). 
[16] G. Voronoi, Journal fur die Reine und Angewandte Math- 

ematik 133, 97 (1907). 
[17] M. de Berg, O. Cheong, M. van Kreveld, and M. Over- [26] 

mars, Computational Geometry, 3rd ed. (Springer, 2008). 
[18] V. Gaede and O. Giinther, ACM Computing Surveys 30, 

170 (1998). 

[19] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. 
Flannery, Numerical Recipes 3rd Edition: The Art of Sci- 



entific Computing (Cambridge University Press, 2007) 
http://www.nr.com. 

B. Abbott et al. (LI GO Scientific), 



Rept. Prog. Phys. 72, 076901 (2009) 
arXiv:arXiv:0711.3041 [gr-qcj| 

Acernese et al, Class. Quant. Grav.. 
Rover, R. Meyer, and 



F 

C 

tensen 



N. 



Chris- 



|arXiv:gr-qc/0602067 



Class. Quant. Grav. 23, 4895 (2006) 



M. V. van der Sluys, C. Rover, A. Stroeer, V. Ray- 
mond, I. Mandel, N. Christensen, V. Kalogera, R. Meyer, 
and A. Vecchio, |Astrophys. J. Lett. 688, L61 (2008) 



arXiv:0710.1 897 
J. Veitch and 

Phys. Rev. D 81 062003~ (2010) , 



Vecchio, 



arXiv:0911.3820 [astro-ph.CO]| 

V. van der 
C. Rover, 



V. Raymond, M. 
del, V. Kalogera, 
tensen, 



Sluys, 
and 



N. 



Man- 
Chris- 



Class. Quant. Grav. 27, 114009 (2010)] 



arXiv:0912.3746 [gr-qc] 



P. Bender et al, LISA Pre-Phase A 
Second Edition, Tech. Rep. MPQ233 
http: / /list .caltech.edu/lib / exe/fetch.php?media= 



Report; 
(1998) 
-documents:early:pr< 



